<div>
	ViSL One-shot: generating Vietnamese sign language data set</div>

Dang Khanh, Bessmertny Igor Alexandrovich

2024 , VOLUME 24, NUMBER 2 ( march-april )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2024-24-2-241-248

ViSL One-shot: generating Vietnamese sign language data set

K. Dang, I. A. Bessmertny

Read the full article

Article in English

For citation:

Dang Khanh, Bessmertny I.A. ViSL One-shot: generating Vietnamese sign language data set. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 241–248. doi: 10.17586/2226-1494-2024-24-2-241-248

Abstract

The development of methods for automatic recognition of objects in a video stream, in particular, recognition of sign language, requires large amounts of video data for training. An established method of data enrichment for machine learning is distortion and noise. The difference between linguistic gestures and other gestures is that small changes in posture can radically change the meaning of a gesture. This imposes specific requirements for data variability. The novelty of the method lies in the fact that instead of distorting frames using affine image transformations, vectorization of the sign language speaker’s pose is used, followed by noise in the form of random deviations of skeletal elements. To implement controlled gesture variability using the MediaPipe library, we convert to a vector format where each vector corresponds to a skeletal element. After this, the image of the figure is restored from the vector representation. The advantage of this method is the possibility of controlled distortion of gestures, corresponding to real deviations in the postures of the sign language speaker. The developed method for enriching video data was tested on a set of 60 words of Indian Sign Language (common to all languages and dialects common in India), represented by 782 video fragments. For each word, the most representative gesture was selected and 100 variations were generated. The remaining, less representative gestures were used as test data. The resulting word-level classification and recognition model using the GRU-LSTM neural network has an accuracy above 95 %. The method tested in this way was transferred to a corpus of 4364 videos in Vietnamese Sign Language for all three regions of Northern, Central and Southern Vietnam. Generated 436,400 data samples, of which 100 data samples represent the meaning of words that can be used to develop and improve Vietnamese sign language recognition methods by generating many variations of gestures with varying degrees of deviation from the standards. The disadvantage of the proposed method is that the accuracy depends on the error of the MediaPipe library. The created video dataset can also be used for automatic sign language translation.

Keywords: Vietnamese sign language, Indian Sign Language, sign language recognition, MediaPipe, coordinate transformation, vector space, random noise, GRU-LSTM, one-shots, data augmentation

References

Li D., Yu X., Xu C., Petersson L., Li H. Transferring Cross-domain Knowledge for Video Sign Language Recognition. Proc. of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6204–6213. https://doi.org/10.1109/cvpr42600.2020.00624
Li D., Opazo C.R., Yu X., Li H. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. Proc. of the IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1448–1458. https://doi.org/10.1109/WACV45572.2020.9093512
Camgoz N.C., Hadfield S., Koller O., Ney H., Bowden R. Neural sign language translation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7784–7793. https://doi.org/10.1109/CVPR.2018.00812
Sridhar A., Ganesan R.G., Kumar P., Khapra M. INCLUDE: A large scale dataset for indian sign language recognition. Proc. of the 28th ACM International Conference on Multimedia, 2020, pp. 1366–1375. https://doi.org/10.1145/3394171.3413528
Ying X. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 2019, vol. 1168, no. 2, pp. 022022. https://doi.org/10.1088/1742-6596/1168/2/022022
Creswell A., White T., Dumoulin V., Arulkumaran K., Sengupta B., Bharath A. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 2018, vol. 35, no. 1, pp. 53–65. https://doi.org/10.1109/MSP.2017.2765202
Gupta K., Singh S., Shrivastava A. PatchVAE: Learning local latent codes for recognition. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4745–4754. https://doi.org/10.1109/CVPR42600.2020.00480
Karras T., Aila T., Laine S., Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation. Proc. of the ICLR 2018 Conference Blind Submission, 2018.
Ma L., Jia X., Sun Q., Schiele B., Tuytelaars T., Van Gool L. Pose guided person image generation. Proc. of the 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017.
Sushko V., Gall J., Khoreva A. One-shot GAN: Learning to generate samples from single images and videos. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 2596–2600. https://doi.org/10.1109/CVPRW53098.2021.00293
Li J., Jing M., Lu K., Ding Z., Zhu L., Huang Z. Leveraging the invariant side of generative zero-shot learning. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7394–7403. https://doi.org/10.1109/CVPR.2019.00758
Madrid G.K.R., Villanueva R.G.R., Caya M.V.C. Recognition of dynamic Filipino Sign language using MediaPipe and long short-term memory. Proc. of the 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2022. https://doi.org/10.1109/ICCCNT54827.2022.9984599
Adhikary S., Talukdar A.K., Sarma K.K. A vision-based system for recognition of words used in Indian Sign Language using MediaPipe. Proc. of the 2021 Sixth International Conference on Image Information Processing (ICIIP), 2021, pp. 390–394. https://doi.org/10.1109/ICIIP53038.2021.9702551
Zhang S., Chen W., Chen C., Liu Y. Human deep squat detection method based on MediaPipe combined with Yolov5 network. Proc. of the 2022 41st Chinese Control Conference (CCC), 2022, pp. 6404–6409. https://doi.org/10.23919/CCC55666.2022.9902631
Quiñonez Y., Lizarraga C., Aguayo R. Machine learning solutions with MediaPipe. Proc. of the 11th International Conference on Software Process Improvement (CIMPS), 2022, pp. 212–215. https://doi.org/10.1109/CIMPS57786.2022.10035706
Ma J., Ma L., Ruan W., Chen H., Feng J. A Wushu posture recognition system based on MediaPipe. Proc. of the 2nd International Conference on Information Technology and Contemporary Sports (TCS), 2022, pp. 10–13. https://doi.org/10.1109/TCS56119.2022.9918744
Cho K., Merriënboer B., Gulcehre C., Bahdanau D., Bougares F., Schwenk H., Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734. https://doi.org/10.3115/v1/D14-1179
Dey R., Salem F.M. Gate-variants of Gated Recurrent Unit (GRU) neural networks. Proc. of the IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), 2017, pp. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243
Kothadiya D., Bhatt C., Sapariya K., Patel K., Gil-González A.-B., Corchado J.M. Deepsign: Sign language detection and recognition using deep learning. Electronics, 2022, vol. 11, no. 11, pp. 1780. https://doi.org/10.3390/electronics11111780
Verma U., Tyagi P., Kaur M. Single input single head CNN-GRU-LSTM architecture for recognition of human activities. Indonesian Journal of Electrical Engineering and Informatics (IJEEI), 2022, vol. 10, no. 2, pp. 410–420. https://doi.org/10.52549/ijeei.v10i2.3475

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License